Content based Sentence Ordering using Spanning Tree Algorithm for Improved Multi Document Summarization

نویسنده

Ansamma John

چکیده

Due to the availability of required information in the web, as multiple documents, the need for summarizing these multiple documents and ordering of the sentences in the summary in an efficient way become a relevant task in data mining. We present a novel sentence ordering method based on maximum cost spanning tree algorithm to improve the readability and cohesion of the summary obtained by extraction method from related multiple documents. It is based on extracting candidate sentences for the summary from multiple documents by ranking the sentences using cosine similarity measure and reducing the redundancy in the summary by Maximal Marginal Relevance (MMR) technique. Sentences in the summary are organized by constructing a graph where each sentence represents nodes of graph and edges are maintained between every pair of vertices which represents the similarity between the sentences. Most important task of our work is to find the first sentence to be placed in the ordered summary, by identifying the sentence which has minimum similarity with the sentences in the extracted summary. Ordering of remaining sentences in the summary is fixed one by one using Prim‟s Maximum Cost Spanning tree algorithm. The proposed algorithm is tested with DUC 2002 data set and found that summary generated after ordering has better readability and cohesion than that generated without ordering. It is noted that results are more impressive as the summary size increases. General Terms Data Mining

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Content based Sentence Ordering using Spanning Tree Algorithm for Improved Multi Document Summarization

متن کامل

A preference learning approach to sentence ordering for multi-document summarization

Ordering information is a difficult but an important task for applications generating naturallanguage texts such as multi-document summarization, question answering, and conceptto-text generation. In multi-document summarization, information is selected from a set of source documents. Therefore, the optimal ordering of those selected pieces of information to create a coherent summary is not obv...

متن کامل

Sentence ordering with manifold-based classification in multi-document summarization

In this paper, we propose a sentence ordering algorithm using a semi-supervised sentence classification and historical ordering strategy. The classification is based on the manifold structure underlying sentences, addressing the problem of limited labeled data. The historical ordering helps to ensure topic continuity and avoid topic bias. Experiments demonstrate that the method is effective.

متن کامل

Significance of Sentence Ordering in Multi Document Summarization

Multi-document summarization represents the information in a concise and comprehensive manner. In this paper we discuss the significance of ordering of sentences in multi document summarization. We show experimental results on DUC2002 dataset. These results show the ordering of summaries before and, improvement in this, after applying sentence ordering. For this purpose we used a term frequency...

متن کامل

Sentence Clustering-based Summarization of Multiple Text Documents

With the rapid growth of the World Wide Web, information overload is becoming a problem for an increasingly large number of people. Automatic Multidocument summarization can be an indispensable solution to reduce the information overload problem on the web. This kind of summarization facility helps users to see at a glance what a collection is about and provides a new way of managing a vast hoa...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Content based Sentence Ordering using Spanning Tree Algorithm for Improved Multi Document Summarization

نویسنده

چکیده

منابع مشابه

Content based Sentence Ordering using Spanning Tree Algorithm for Improved Multi Document Summarization

A preference learning approach to sentence ordering for multi-document summarization

Sentence ordering with manifold-based classification in multi-document summarization

Significance of Sentence Ordering in Multi Document Summarization

Sentence Clustering-based Summarization of Multiple Text Documents

عنوان ژورنال:

اشتراک گذاری